Analysing Finnish with word lists: the DDI approach to morphology revisited

نویسنده

  • Atro Voutilainen
چکیده

Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitively high. We created and experimented with a list-based lexicon for a morphologically complex language (Finnish), and compared its coverage with that of a mature morphological analyser on new text in two experimental settings. e observed smallish difference in coverage suggests the viability of using simple and easy-to-modify list-based lexicons as an initial part of morphological analysis, to increase developer control on the vast majority of input tokens.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Do We Need Discipline-Specific Academic Word Lists? Linguistics Academic Word List (LAWL)

This corpus-based study aimed at exploring the most frequently-used academic words in linguistics and compare the wordlist with the distribution of high frequency words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) to examine their coverage within the linguistics corpus. To this end, a corpus of 700 linguistics research articles (LRAC), consisting of approximately ...

متن کامل

Material Development and English for Academic Purposes Word Lists; a Reductionist Approach

Nagy (1988) states that vocabulary is a prerequisite factor in comprehension. Drawing upon a reductionist approach and having in mind the prospects for material development, this study aimed at creating an English for Academic Purposes Word List (EAPWL). The corpus of this study was compiled from a corpus containing 6479 pages of texts, 2,081,678 million tokens (running words) and 63825 types (...

متن کامل

Vocabulary Lists for EAP and Conversation Students

Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...

متن کامل

Neural dynamics of reading morphologically complex words

Despite considerable research interest, it is still an open issue as to how morphologically complex words such as "car+s" are represented and processed in the brain. We studied the neural correlates of the processing of inflected nouns in the morphologically rich Finnish language. Previous behavioral studies in Finnish have yielded a robust inflectional processing cost, i.e., inflected words ar...

متن کامل

Bertrand’s Paradox Revisited: More Lessons about that Ambiguous Word, Random

The Bertrand paradox question is: “Consider a unit-radius circle for which the length of a side of an inscribed equilateral triangle equals 3 . Determine the probability that the length of a ‘random’ chord of a unit-radius circle has length greater than 3 .” Bertrand derived three different ‘correct’ answers, the correctness depending on interpretation of the word, random. Here we employ geomet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018